{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "AlphaFold2_batch.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "language": "python",
      "display_name": "Python 3 (ipykernel)"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/batch/AlphaFold2_batch.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G4yBrceuFbf3"
      },
      "source": [
        "#ColabFold: AlphaFold2 w/ MMseqs2 BATCH\n",
        "\n",
        "<img src=\"https://raw.githubusercontent.com/sokrypton/ColabFold/main/.github/ColabFold_Marv_Logo_Small.png\" height=\"256\" align=\"right\" style=\"height:256px\">\n",
        "\n",
        "Easy to use AlphaFold2 protein structure [(Jumper et al. 2021)](https://www.nature.com/articles/s41586-021-03819-2) and complex [(Evans et al. 2021)](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1) prediction using multiple sequence alignments generated through MMseqs2. For details, refer to our manuscript:\n",
        "\n",
        "[Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: Making protein folding accessible to all.\n",
        "*Nature Methods*, 2022](https://www.nature.com/articles/s41592-022-01488-1) \n",
        "\n",
        "**Usage**\n",
        "\n",
        "`input_dir` directory with only fasta files or MSAs stored in Google Drive. MSAs need to be A3M formatted and have an `.a3m` extention. For MSAs MMseqs2 will not be called.\n",
        "\n",
        "`result_dir` results will be written to the result directory in Google Drive\n",
        "\n",
        "Old versions: [v1.0-alpha](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.0-alpha/batch/AlphaFold2_batch.ipynb), [v1.1-permultimer](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.1-premultimer//batch/AlphaFold2_batch.ipynb), [v1.2](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.2.0/batch/AlphaFold2_batch.ipynb), [v1.3](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.3.0/batch/AlphaFold2_batch.ipynb)\n",
        "\n",
        "<strong>For more details, see <a href=\"#Instructions\">bottom</a> of the notebook and checkout the [ColabFold GitHub](https://github.com/sokrypton/ColabFold). </strong>"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "AwvIWN3HDyUJ",
        "cellView": "form"
      },
      "source": [
        "#@title Mount google drive\n",
        "from google.colab import drive\n",
        "drive.mount('/content/drive')\n",
        "from sys import version_info \n",
        "python_version = f\"{version_info.major}.{version_info.minor}\""
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "kOblAo-xetgx",
        "cellView": "form"
      },
      "source": [
        "#@title Input protein sequence, then hit `Runtime` -> `Run all`\n",
        "\n",
        "input_dir = '/content/drive/MyDrive/input_fasta' #@param {type:\"string\"}\n",
        "result_dir = '/content/drive/MyDrive/result' #@param {type:\"string\"}\n",
        "\n",
        "# number of models to use\n",
        "#@markdown ---\n",
        "#@markdown ### Advanced settings\n",
        "msa_mode = \"MMseqs2 (UniRef+Environmental)\" #@param [\"MMseqs2 (UniRef+Environmental)\", \"MMseqs2 (UniRef only)\",\"single_sequence\",\"custom\"]\n",
        "num_models = 5 #@param [1,2,3,4,5] {type:\"raw\"}\n",
        "num_recycles = 3 #@param [1,3,6,12,24,48] {type:\"raw\"}\n",
        "stop_at_score = 100 #@param {type:\"string\"}\n",
        "#@markdown - early stop computing models once score > threshold (avg. plddt for \"structures\" and ptmscore for \"complexes\")\n",
        "use_custom_msa = False\n",
        "use_amber = False #@param {type:\"boolean\"}\n",
        "use_templates = False #@param {type:\"boolean\"}\n",
        "do_not_overwrite_results = True #@param {type:\"boolean\"}\n",
        "zip_results = False #@param {type:\"boolean\"}\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iccGdbe_Pmt9",
        "cellView": "form"
      },
      "source": [
        "#@title Install dependencies\n",
        "%%bash -s $use_amber $use_templates $python_version\n",
        "\n",
        "set -e\n",
        "\n",
        "USE_AMBER=$1\n",
        "USE_TEMPLATES=$2\n",
        "PYTHON_VERSION=$3\n",
        "\n",
        "if [ ! -f COLABFOLD_READY ]; then\n",
        "  # install dependencies\n",
        "  # We have to use \"--no-warn-conflicts\" because colab already has a lot preinstalled with requirements different to ours\n",
        "  pip install -q --no-warn-conflicts \"colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold\"\n",
        "  # high risk high gain\n",
        "  pip install -q \"jax[cuda11_cudnn805]>=0.3.8,<0.4\" -f https://storage.googleapis.com/jax-releases/jax_releases.html\n",
        "  touch COLABFOLD_READY\n",
        "fi\n",
        "\n",
        "# Download params (~1min)\n",
        "python -m colabfold.download\n",
        "\n",
        "# setup conda\n",
        "if [ ${USE_AMBER} == \"True\" ] || [ ${USE_TEMPLATES} == \"True\" ]; then\n",
        "  if [ ! -f CONDA_READY ]; then\n",
        "    wget -qnc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n",
        "    bash Miniconda3-latest-Linux-x86_64.sh -bfp /usr/local 2>&1 1>/dev/null\n",
        "    rm Miniconda3-latest-Linux-x86_64.sh\n",
        "    touch CONDA_READY\n",
        "  fi\n",
        "fi\n",
        "# setup template search\n",
        "if [ ${USE_TEMPLATES} == \"True\" ] && [ ! -f HH_READY ]; then\n",
        "  conda install -y -q -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 python=\"${PYTHON_VERSION}\" 2>&1 1>/dev/null\n",
        "  touch HH_READY\n",
        "fi\n",
        "# setup openmm for amber refinement\n",
        "if [ ${USE_AMBER} == \"True\" ] && [ ! -f AMBER_READY ]; then\n",
        "  conda install -y -q -c conda-forge openmm=7.5.1 python=\"${PYTHON_VERSION}\" pdbfixer 2>&1 1>/dev/null\n",
        "  touch AMBER_READY\n",
        "fi"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hUYApPElB30u",
        "cellView": "form"
      },
      "source": [
        "#@title Run Prediction\n",
        "\n",
        "import sys\n",
        "\n",
        "from colabfold.batch import get_queries, run\n",
        "from colabfold.download import default_data_dir\n",
        "from colabfold.utils import setup_logging\n",
        "from pathlib import Path\n",
        "\n",
        "# For some reason we need that to get pdbfixer to import\n",
        "if use_amber and f\"/usr/local/lib/python{python_version}/site-packages/\" not in sys.path:\n",
        "    sys.path.insert(0, f\"/usr/local/lib/python{python_version}/site-packages/\")\n",
        "\n",
        "if 'logging_setup' not in globals():\n",
        "    setup_logging(Path(result_dir).joinpath(\"log.txt\"))\n",
        "    logging_setup = True\n",
        "\n",
        "queries, is_complex = get_queries(input_dir)\n",
        "run(\n",
        "    queries=queries,\n",
        "    result_dir=result_dir,\n",
        "    use_templates=use_templates,\n",
        "    use_amber=use_amber,\n",
        "    msa_mode=msa_mode,\n",
        "    model_type=\"auto\",\n",
        "    num_models=num_models,\n",
        "    num_recycles=num_recycles,\n",
        "    model_order=[3, 4, 5, 1, 2],\n",
        "    is_complex=is_complex,\n",
        "    data_dir=default_data_dir,\n",
        "    keep_existing_results=do_not_overwrite_results,\n",
        "    rank_by=\"auto\",\n",
        "    pair_mode=\"unpaired+paired\",\n",
        "    stop_at_score=stop_at_score,\n",
        "    zip_results=zip_results,\n",
        ")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UGUBLzB3C6WN"
      },
      "source": [
        "# Instructions <a name=\"Instructions\"></a>\n",
        "**Quick start**\n",
        "1. Upload your single fasta files to a folder in your Google Drive\n",
        "2. Define path to the fold containing the fasta files (`input_dir`) define an outdir (`output_dir`)\n",
        "3. Press \"Runtime\" -> \"Run all\".\n",
        "\n",
        "**Result zip file contents**\n",
        "\n",
        "At the end of the job a all results `jobname.result.zip` will be uploaded to your (`output_dir`) Google Drive. Each zip contains one protein.\n",
        "\n",
        "1. PDB formatted structures sorted by avg. pIDDT. (unrelaxed and relaxed if `use_amber` is enabled).\n",
        "2. Plots of the model quality.\n",
        "3. Plots of the MSA coverage.\n",
        "4. Parameter log file.\n",
        "5. A3M formatted input MSA.\n",
        "6. BibTeX file with citations for all used tools and databases.\n",
        "\n",
        "\n",
        "**Troubleshooting**\n",
        "* Check that the runtime type is set to GPU at \"Runtime\" -> \"Change runtime type\".\n",
        "* Try to restart the session \"Runtime\" -> \"Factory reset runtime\".\n",
        "* Check your input sequence.\n",
        "\n",
        "**Known issues**\n",
        "* Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.\n",
        "* Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.\n",
        "* Your browser can block the pop-up for downloading the result file. You can choose the `save_to_google_drive` option to upload to Google Drive instead or manually download the result file: Click on the little folder icon to the left, navigate to file: `jobname.result.zip`, right-click and select \\\"Download\\\" (see [screenshot](https://pbs.twimg.com/media/E6wRW2lWUAEOuoe?format=jpg&name=small)).\n",
        "\n",
        "**Limitations**\n",
        "* Computing resources: Our MMseqs2 API can handle ~20-50k requests per day.\n",
        "* MSAs: MMseqs2 is very precise and sensitive but might find less hits compared to HHblits/HMMer searched against BFD or Mgnify.\n",
        "* We recommend to additionally use the full [AlphaFold2 pipeline](https://github.com/deepmind/alphafold).\n",
        "\n",
        "**Description of the plots**\n",
        "*   **Number of sequences per position** - We want to see at least 30 sequences per position, for best performance, ideally 100 sequences.\n",
        "*   **Predicted lDDT per position** - model confidence (out of 100) at each position. The higher the better.\n",
        "*   **Predicted Alignment Error** - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better.\n",
        "\n",
        "**Bugs**\n",
        "- If you encounter any bugs, please report the issue to https://github.com/sokrypton/ColabFold/issues\n",
        "\n",
        "**License**\n",
        "\n",
        "The source code of ColabFold is licensed under [MIT](https://raw.githubusercontent.com/sokrypton/ColabFold/main/LICENSE). Additionally, this notebook uses AlphaFold2 source code and its parameters licensed under [Apache 2.0](https://raw.githubusercontent.com/deepmind/alphafold/main/LICENSE) and  [CC BY 4.0](https://creativecommons.org/licenses/by-sa/4.0/) respectively. Read more about the AlphaFold license [here](https://github.com/deepmind/alphafold).\n",
        "\n",
        "**Acknowledgments**\n",
        "- We thank the AlphaFold team for developing an excellent model and open sourcing the software. \n",
        "\n",
        "- [Söding Lab](https://www.mpibpc.mpg.de/soeding) for providing the computational resources for the MMseqs2 server\n",
        "\n",
        "- Do-Yoon Kim for creating the ColabFold logo.\n",
        "\n",
        "- A colab by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)).\n"
      ]
    }
  ]
}